NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning Using Passive Langevin Dynamics

https://doi.org/10.1109/TIT.2025.3555479

Snow, Luke; Krishnamurthy, Vikram (June 2025, IEEE Transactions on Information Theory)

This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algo- rithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner’s cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms.
more » « less
Free, publicly-accessible full text available June 1, 2026
Lyapunov based Stochastic Stability of Human-Machine Interaction: A Quantum Decision System Approach

https://doi.org/10.1109/CDC51059.2022.9992472

Snow, Luke; Jain, Shashwat; Krishnamurthy, Vikram (January 2023, 2022 IEEE 61st Conference on Decision and Control (CDC))

In mathematical psychology, decision makers are modeled using the Lindbladian equations from quantum mechanics to capture important human-centric features such as order effects and violation of the sure thing principle. We consider human-machine interaction involving a quantum decision maker (human) and a controller (machine). Given a sequence of human decisions over time, how can the controller dynamically provide input messages to adapt these decisions so as to converge to a specific decision? We show via novel stochastic Lyapunov arguments how the Lindbladian dynamics of the quantum decision maker can be controlled to converge to a specific decision asymptotically. Our methodology yields a useful mathematical framework for human-sensor decision making. The stochastic Lyapunov results are also of independent interest as they generalize recent results in the literature.
more » « less
Full Text Available
Quickest Change Detection using Time Inconsistent Anticipatory and Quantum Decision Modeling

https://doi.org/10.1109/Allerton49937.2022.9929427

Krishnamurthy, Vikram; Snow, Luke (September 2022, 2022 58th Annual Allerton Conference on Communication, Control, and Computing)

This paper considers quickest detection scheme where the change in an underlying parameter influencing human decisions is to be detected by only observing the human decisions. Stemming from behavioral economics and mathematical psychology, we propose two generative models for the human decision maker. Namely, we consider an anticipatory decision making model and a quantum decision model. From a decision theoretic point of view, anticipatory models are time inconsistent, meaning that Bellman's principle of optimality does not hold. The appropriate formalism is thus the subgame Nash equilibrium. We show that the interaction between anticipatory agents and sequential quickest detection results in unusual (nonconvex) structure of the quickest change detection policy. In contrast the quantum decision model, despite its mathematical complexity, results in the typical convex quickest detection policy. The optimal quickest detection policy is shown to perform strictly worse than classical quickest detection for both models, via a Blackwell dominance argument. The model and structural results provided contribute to an understanding of the dynamics of human-sensor interfacing.
more » « less
Full Text Available

Search for: All records